A unified data mining solution for authorship analysis in anonymous textual communications

نویسندگان

  • Farkhund Iqbal
  • Hamad Binsalleeh
  • Benjamin C. M. Fung
  • Mourad Debbabi
چکیده

The cyber world provides an anonymous environment for criminals to conduct malicious activities such as spamming, sending ransom e-mails, and spreading botnet malware. Often, these activities involve textual communication between a criminal and a victim, or between criminals themselves. The forensic analysis of online textual documents for addressing the anonymity problem called authorship analysis is the focus of most cybercrime investigations. Authorship analysis is the statistical study of linguistic and computational characteristics of the written documents of individuals. This paper is the first work that presents a unified data mining solution to address authorship analysis problems based on the concept of frequent pattern-based writeprint. Extensive experiments on real-life data suggest that our proposed solution can precisely capture the writing styles of individuals. Furthermore, the writeprint is effective to identify the author of an anonymous text from ∗Corresponding author Email addresses: [email protected] (Farkhund Iqbal), [email protected] (Hamad Binsalleeh), [email protected] (Benjamin C. M. Fung), [email protected] (Mourad Debbabi) Preprint submitted to Information Sciences March 10, 2011 a group of suspects and to infer sociolinguistic characteristics of the author.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph-based and Lexical-Syntactic Approaches for the Authorship Attribution Task

In this paper we present two different approaches for tackling the authorship attribution task. The first approach uses a set of phrase-level lexicalsyntactic features, whereas the second approach considers a graph-based text representation together with a data mining technique for discovering authorship patterns which may be further used for attributing the authorship of an anonymous document....

متن کامل

Plagiarism and authorship analysis: introduction to the special issue

The Internet has facilitated both the dissemination of anonymous texts as well as easy ‘‘borrowing’’ of ideas and words of others. This has raised a number of important questions regarding authorship. Can we identify the anonymous author of a text by comparing the text with the writings of known authors? Can we determine if a text, or parts of it, has been plagiarized? Such questions are clearl...

متن کامل

Data Mining Instant Messaging Communications to Perform Author Identification for Cybercrime Investigations

Instant messaging is a form of computer-mediated communication (CMC) with unique characteristics that reflect a realistic presentation of an author’s online stylistic characteristics. Instant messaging communications use virtual identities, which hinder social accountability and facilitate IM-related cybercrimes. Criminals often use virtual identities to hide their true identity and may also su...

متن کامل

A Novel Approach of Mining Write-Prints for Authorship Attribution in E-mail Forensics

There is an alarming increase in the number of cybercrime incidents through anonymous e-mails. The problem of e-mail authorship attribution is to identify the most plausible author of an anonymous e-mail from a group of potential suspects. Most previous contributions employed a traditional classification approach, such as decision tree and Support Vector Machine (SVM), to identify the author an...

متن کامل

A unified theoretical harmonic analysis approach to the cyclic wavelet transform (CWT) for periodic signals of prime dimensions

The article introduces cyclic dilation groups and finite affine groups for prime integers, and  as an application of this theory it presents a unified group theoretical approach for the  cyclic wavelet transform (CWT) of prime dimensional periodic signals.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 231  شماره 

صفحات  -

تاریخ انتشار 2013